Menu Top
Applied Mathematics for Class 11th & 12th (Concepts and Questions)
11th Concepts Questions
12th Concepts Questions

Applied Maths Class 12th Chapters (Concepts)
1. Numbers, Quantification and Numerical Applications 2. Matrices 3. Differentiation and Its Applications
4. Integration and Its Application 5. Differential Equations and Modeling 6. Probability Distribution
7. Inferential Statistics 8. Index Numbers and Time Based Data 9. Financial Mathematics
10. Linear Programming

Content On This Page
Population and Sample Parameter and Statistics and Statistical Interferences t-Test (one sample t-test and two independent groups t-test)


Chapter 7 Inferential Statistics (Concepts)

Welcome to the fascinating realm of Inferential Statistics, a crucial branch of statistics that empowers us to make informed judgments and draw meaningful conclusions about vast populations based on the analysis of smaller, manageable samples. While Descriptive Statistics, covered previously, focuses on summarizing and organizing the data we have, inferential statistics takes the vital leap towards generalization and decision-making under uncertainty. It provides the methodologies for moving beyond sample observations to make probabilistic statements about underlying population characteristics, forming the backbone of scientific research, quality control, market analysis, and data-driven policy formulation. Understanding the principles herein is essential for interpreting statistical results critically and making evidence-based decisions in numerous applied contexts.

The core idea revolves around the distinction between a population (the entire group of interest) and a sample (a subset drawn from the population). We are typically interested in unknown population parameters, such as the population mean ($\mu$) or population proportion ($P$), but often only have access to data from a sample, from which we calculate corresponding sample statistics, like the sample mean ($\bar{x}$) or sample proportion ($\hat{p}$). The key challenge is to use these sample statistics to make reliable inferences about the population parameters. The validity of these inferences heavily relies on the quality of the sample; hence, the importance of random sampling techniques to ensure the sample is representative of the population is paramount. Inferential statistics provides the formal framework for bridging the gap between sample information and population truth.

A foundational concept underpinning many inferential procedures is the sampling distribution. Imagine repeatedly drawing samples of the same size ($n$) from a population and calculating a statistic (like $\bar{x}$) for each sample. The probability distribution of all these possible sample statistics is called the sampling distribution. Crucially, the Central Limit Theorem (CLT) provides a remarkable insight: for sufficiently large sample sizes (often cited as $n \ge 30$), the sampling distribution of the sample mean ($\bar{x}$) will be approximately normally distributed, irrespective of the original population's distribution shape. Furthermore, this sampling distribution will have a mean equal to the population mean ($\mu$) and a standard deviation, known as the standard error of the mean, equal to $\frac{\sigma}{\sqrt{n}}$, where $\sigma$ is the population standard deviation. The CLT is powerful because it allows us to use the properties of the normal distribution for inference about $\mu$ even when the population distribution is unknown.

Inferential statistics primarily encompasses two major areas: Estimation and Hypothesis Testing. Estimation focuses on using sample data to estimate the value of an unknown population parameter. Point Estimation provides a single best guess (e.g., using $\bar{x}$ as an estimate for $\mu$). However, acknowledging sampling variability, Interval Estimation provides a more informative range of plausible values for the parameter, known as a Confidence Interval. We will explore how to construct and interpret confidence intervals, typically for the population mean $\mu$, often using the Z-distribution (if $\sigma$ is known or $n$ is large) or the t-distribution (if $\sigma$ is unknown and $n$ is small). The general structure of a confidence interval is Point Estimate $\pm$ Margin of Error, providing a specified level of confidence (e.g., 95%) that the true parameter lies within the calculated interval.

The second major area is Hypothesis Testing, a formal procedure for making decisions about population parameters based on sample evidence. It involves formulating two competing hypotheses: the Null Hypothesis ($H_0$), typically representing a statement of no effect, no difference, or the status quo, and the Alternative Hypothesis ($H_a$ or $H_1$), representing what the researcher aims to find evidence for. Key concepts include understanding the potential for errors (Type I error: rejecting $H_0$ when it's true; Type II error: failing to reject $H_0$ when it's false), setting a level of significance ($\alpha$) (the probability of a Type I error), calculating a test statistic (a value summarizing the sample evidence, e.g., $Z = \frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}$), determining the p-value (the probability of observing a test statistic as extreme as or more extreme than the one calculated, assuming $H_0$ is true), and defining a critical region. The decision to reject or fail to reject $H_0$ is made by comparing the test statistic to a critical value or, more commonly, by comparing the p-value to $\alpha$. We will likely explore basic tests concerning the population mean ($\mu$), such as the one-sample Z-test or t-test. These inferential tools are fundamental for drawing statistically sound conclusions from data.



Population and Sample


In the field of statistics, we are often interested in understanding the characteristics or properties of a large group of individuals, objects, or data points. For example, we might want to know the average income of all households in a city, the effectiveness of a new medicine on all patients with a certain disease, or the quality of all products produced by a factory. This entire group that we are interested in is called the Population.

A Population is the complete set of all possible observations, individuals, units, or data that share a common characteristic of interest for a particular study. Defining the population is the first crucial step in any statistical investigation, as it clearly specifies the group about which conclusions are to be drawn. The size of a population is denoted by $N$.

Examples of populations:

While our goal is to understand the population, collecting data from every single member of the population (conducting a census) is often impractical, resource-intensive, or even impossible, especially if the population is very large, geographically dispersed, or constantly changing.


Sample

Since studying the entire population is rarely feasible, statisticians resort to studying a smaller, more manageable subset drawn from the population. This subset is called a Sample.

A Sample is a subgroup or subset of the population from which data is actually collected. The data collected from the sample is then used to make generalizations or inferences about the characteristics of the entire population. The size of the sample is denoted by $n$, where typically $n < N$.

Examples of samples corresponding to the populations mentioned above:

The key to making valid inferences about the population from a sample is ensuring that the sample is representative of the population. A representative sample accurately reflects the characteristics of the population from which it was drawn. The best way to obtain a representative sample is typically through random sampling methods, where each member of the population has a known and non-zero chance of being selected for the sample. This helps in avoiding selection bias.


Why Study a Sample Instead of the Population?

There are several compelling reasons why sampling is almost always preferred over conducting a full census of the population:

The goal of Inferential Statistics, which is the subject of this chapter, is precisely to develop methods for using information gathered from a sample to draw reliable conclusions or make inferences about the unknown characteristics of the entire population.



Parameter, Statistic, and Statistical Inference


When we conduct a statistical study, whether we are examining a population or a sample, we are typically interested in measuring certain numerical characteristics that describe the group. These characteristics can be measures of central tendency (like averages), measures of dispersion (like variability), or measures of proportion (like percentages). The terminology used to describe these characteristics depends on whether they pertain to the entire population or just to a sample drawn from that population. This distinction is fundamental to understanding inferential statistics.


Parameter

A parameter is a numerical characteristic that describes some aspect of the entire population. Parameters are considered fixed values for a given population, although in most real-world scenarios, their true values are unknown to us because we rarely have data for the entire population.

Parameters are typically denoted by Greek letters. Some common population parameters include:

For example, if we are interested in the average height of all adult males in Delhi, the true average height of all adult males in Delhi is the population mean ($\mu$). If we are interested in the percentage of voters in Karnataka who support a particular political party, the actual percentage among all voters in Karnataka is the population proportion ($p$). These values exist, but we usually don't know them exactly.


Statistic

A statistic (or sample statistic) is a numerical characteristic that describes some aspect of a sample. Statistics are calculated directly from the data collected from the sample. Unlike parameters, the value of a statistic is known once the sample data is collected.

Since a sample is only a part of the population, and different samples drawn from the same population are likely to contain different individuals, the value of a statistic can vary from sample to sample. This variability of statistics is a key concept in inferential statistics.

Statistics are typically denoted by Roman letters. Some common sample statistics include:

For instance, if we take a sample of 50 adult males from Delhi and calculate their average height, this calculated average is the sample mean ($\bar{x}$). This sample mean is used as an estimate for the unknown population mean height ($\mu$). Similarly, if we survey 500 voters in Karnataka and find that 45% of them support a particular party, then 45% is the sample proportion ($\hat{p}$), which serves as an estimate for the unknown population proportion ($p$).

The table below summarizes the distinction:

Characteristic Population (Parameter) Sample (Statistic)
Refers to The entire group of interest A subset of the population
Value Fixed, but typically unknown Varies from sample to sample, is calculated from data
Notation (Examples) $\mu$ (mean), $\sigma$ (standard deviation), $\sigma^2$ (variance), $p$ or $\pi$ (proportion) $\bar{x}$ (mean), $s$ (standard deviation), $s^2$ (variance), $\hat{p}$ (proportion)
Goal To describe the population To describe the sample and make inferences about the population

Statistical Inference

Statistical inference is the main objective when we collect data from a sample. It is the process of using the information obtained from a sample to draw conclusions, make estimations, or test hypotheses about the characteristics (parameters) of the larger population from which the sample was drawn. Since we are using partial information (from the sample) to make statements about the whole (the population), statistical inference always involves some degree of uncertainty. A key part of statistical inference is quantifying this uncertainty, usually using probability.

The two primary branches of statistical inference are:

Estimation

Estimation involves using sample statistics to estimate the unknown value of population parameters. There are two main types of estimation:

Hypothesis Testing

Hypothesis testing (also known as significance testing) is a formal procedure used to evaluate a claim or statement (a hypothesis) about a population parameter using evidence from sample data. It involves a structured process:

  1. Formulate Hypotheses: State a null hypothesis ($H_0$), which is typically a statement of no effect or no difference (e.g., the average income is ₹50,000). State an alternative hypothesis ($H_1$ or $H_a$), which is contrary to the null hypothesis (e.g., the average income is not ₹50,000, or is greater than ₹50,000).
  2. Select Significance Level: Choose a level of significance ($\alpha$), which is the probability of rejecting the null hypothesis when it is actually true (Type I error). Common values are $\alpha = 0.05$ (5%) or $\alpha = 0.01$ (1%).
  3. Collect Sample Data: Obtain a random sample from the population.
  4. Calculate Test Statistic: Compute a value from the sample data (a test statistic, e.g., a t-statistic or z-statistic) that measures how far the sample result deviates from what is expected under the null hypothesis.
  5. Determine P-value or Critical Value: Compare the test statistic to a known probability distribution (like the t-distribution or standard normal distribution) to find the p-value (the probability of observing a sample result as extreme as, or more extreme than, the one obtained, assuming the null hypothesis is true) or compare the test statistic to critical value(s) from the distribution based on the chosen $\alpha$.
  6. Make a Decision: If the p-value is less than $\alpha$ (or if the test statistic falls into the rejection region defined by the critical value(s)), we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis.
  7. Draw Conclusion: State the conclusion in the context of the original problem, interpreting what the decision means regarding the population parameter.

Hypothesis testing allows us to make probabilistic statements about the strength of evidence against the null hypothesis. For example, based on sample data, we might conclude that there is sufficient evidence to reject the claim that the average income is ₹50,000 and infer that it is likely different.

Inferential statistics provides the tools and methods to bridge the gap between sample information and population knowledge, enabling data-driven decision-making in situations where complete population data is unavailable.



t-Test (One Sample and Two Independent Groups)


In inferential statistics, one of the most frequent tasks is comparing means. We might want to compare the mean of a single sample to a known or hypothesized value, or compare the means of two different groups. The t-test is a powerful and widely used statistical method for performing such comparisons, particularly when dealing with sample data and when the population standard deviation is unknown.

The t-test is based on the t-distribution (also known as Student's t-distribution), which is similar in shape to the normal distribution but has heavier tails. This means it has more probability in the tails and less in the center compared to the normal distribution. The shape of the t-distribution depends on a parameter called the degrees of freedom (df), which is related to the sample size. As the sample size (and thus the degrees of freedom) increases, the t-distribution approaches the standard normal distribution.


Assumptions of the t-Test

For the results of a t-test to be valid, certain assumptions about the data should ideally be met. While the t-test is relatively robust to minor violations of some assumptions, especially with larger sample sizes, it's important to be aware of them:

For the two-independent-samples t-test (especially the pooled variance version), an additional assumption is:


One-Sample t-Test

The one-sample t-test is used to determine if the mean of a single sample is statistically different from a known or hypothesized value for the population mean. This hypothesized value is often based on prior research, a theoretical expectation, or a standard.

Purpose

To test whether the sample mean ($\bar{x}$) provides sufficient evidence to conclude that the true population mean ($\mu$) is different from a specific value ($\mu_0$).

Hypotheses

We formulate a null hypothesis ($H_0$) and an alternative hypothesis ($H_1$ or $H_a$) about the population mean ($\mu$) in relation to the hypothesized value ($\mu_0$).

Test Statistic

The one-sample t-statistic measures how many standard errors the sample mean ($\bar{x}$) is away from the hypothesized population mean ($\mu_0$). When the population standard deviation ($\sigma$) is unknown (which is typically the case when using a t-test), we use the sample standard deviation ($s$) to estimate the standard error of the mean. The standard error of the mean is $\frac{\sigma}{\sqrt{n}}$, and its estimate is $\frac{s}{\sqrt{n}}$.

The formula for the one-sample t-statistic is:

$\mathbf{t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}}$

... (1)

where:

Under the null hypothesis, this t-statistic follows a t-distribution with degrees of freedom $df = n - 1$. The degrees of freedom represent the number of independent pieces of information available to estimate the population variance.

Decision Rule

To make a decision about whether to reject $H_0$, we compare the calculated t-statistic to critical values from the t-distribution table or use the p-value approach. This comparison is done at a pre-determined significance level, $\alpha$.

Rejecting $H_0$ suggests that there is statistically significant evidence, based on the sample, to support the alternative hypothesis. Failing to reject $H_0$ means the sample data does not provide enough evidence to contradict the null hypothesis; it does *not* mean that $H_0$ is true.


Two Independent Samples t-Test

The two independent samples t-test (also called the unpaired t-test or independent groups t-test) is used to determine if there is a statistically significant difference between the means of two independent groups. "Independent" means that the individuals or observations in one group are unrelated to the individuals or observations in the other group (e.g., comparing the average height of males and females, where individuals are measured once and belong to only one group).

Purpose

To test whether the difference between the sample means of two independent groups ($\bar{x}_1$ and $\bar{x}_2$) is large enough to conclude that the true population means ($\mu_1$ and $\mu_2$) of the two groups are different.

Hypotheses

Let $\mu_1$ and $\mu_2$ be the true population means for the two groups.

Test Statistic (Assuming Equal Variances - Pooled t-Test)

This version of the two-independent-samples t-test assumes that the population variances of the two groups are equal ($\sigma_1^2 = \sigma_2^2$). If this assumption is reasonable or justified, pooling the sample variances provides a better estimate of the common population variance.

First, we calculate the pooled sample variance ($s_p^2$):

$\mathbf{s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}$

... (2)

where:

The term $(n_1-1)$ and $(n_2-1)$ are the degrees of freedom for each sample variance. The denominator $n_1 + n_2 - 2$ is the total degrees of freedom for the pooled variance estimate.

The two-sample t-statistic measures the difference between the two sample means relative to the estimated standard error of the difference between means. The estimated standard error of the difference between means (when variances are pooled) is $\sqrt{s_p^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}$.

The formula for the pooled two-independent-samples t-statistic is:

$\mathbf{t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)_0}{\sqrt{s_p^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}}$

... (3)

Under the null hypothesis $H_0: \mu_1 = \mu_2$, which means $\mu_1 - \mu_2 = 0$, the formula simplifies to:

$\mathbf{t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s_p^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}}$

... (4)

where:

Under the null hypothesis, this t-statistic follows a t-distribution with degrees of freedom $df = n_1 + n_2 - 2$.

Test Statistic (Assuming Unequal Variances - Welch's t-Test)

If the assumption of equal variances is not met (which can be tested using methods like Levene's test), or if the sample sizes are very different, it is more appropriate to use Welch's t-test. The formula for the test statistic is slightly different:

$\mathbf{t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}}$

... (5)

where:

The degrees of freedom for Welch's t-test are calculated using a more complex formula (the Welch-Satterthwaite equation), which usually results in a non-integer value. For practical purposes, software typically computes this value. In introductory contexts, the pooled t-test is often the focus.

Decision Rule

The decision rule for the two-sample t-test (pooled or Welch's) is the same as for the one-sample t-test, but using the appropriate degrees of freedom ($df = n_1 + n_2 - 2$ for pooled, or the calculated value for Welch's) and comparing the calculated t-statistic to the critical value(s) from the t-distribution or using the p-value.


Examples

Example 1 (One-Sample). A manufacturer claims that the average lifespan of a certain type of LED bulb is 10,000 hours. A quality control engineer tests a random sample of 25 bulbs and finds their average lifespan is 9,600 hours with a sample standard deviation of 1,500 hours. At a 5% significance level ($\alpha = 0.05$), can the engineer conclude that the manufacturer's claim is incorrect?

Answer:

Given:

  • Hypothesized population mean ($\mu_0$): 10,000 hours
  • Sample size ($n$): 25
  • Sample mean ($\bar{x}$): 9,600 hours
  • Sample standard deviation ($s$): 1,500 hours
  • Significance level ($\alpha$): 0.05

We want to determine if the sample mean (9,600 hours) is significantly different from the claimed population mean (10,000 hours). This suggests a two-tailed test.

Hypotheses:

Let $\mu$ be the true average lifespan of the LED bulbs.

$H_0: \mu = 10000$

[The true mean lifespan is 10,000 hours]

$H_1: \mu \ne 10000$

[The true mean lifespan is not 10,000 hours (manufacturer's claim is incorrect)]

Solution:

We use the one-sample t-test statistic formula:

$t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$

[Formula (1)]

Substitute the given values:

$t = \frac{9600 - 10000}{1500 / \sqrt{25}}$

$t = \frac{-400}{1500 / 5}$

[Calculate numerator and square root]

$t = \frac{-400}{300}$

[Calculate denominator]

$t = -\frac{4}{3} \approx -1.333$

[Calculate t-statistic] ... (1)

The degrees of freedom are $df = n - 1 = 25 - 1 = 24$.

For a two-tailed test at $\alpha = 0.05$ with $df = 24$, we look up the critical t-values from a t-distribution table. The critical values are approximately $\pm 2.064$.

Decision: The calculated t-statistic is $t_{calculated} = -1.333$. The critical values are $t_{critical} = \pm 2.064$. We compare the absolute value of the calculated t-statistic to the positive critical value: $|-1.333| = 1.333$. Since $1.333 \le 2.064$, the calculated t-statistic falls within the acceptance region (i.e., it is not in the tails beyond the critical values). Therefore, we fail to reject the null hypothesis $H_0$.

Conclusion: At the 5% significance level, there is not enough statistical evidence from the sample to conclude that the true average lifespan of the LED bulbs is significantly different from the manufacturer's claim of 10,000 hours. The observed sample mean of 9,600 hours is not sufficiently far from 10,000 to reject the claim, given the sample variability.


Example 2 (Two Independent Samples). An educational researcher wants to compare the effectiveness of two different teaching methods (Method A and Method B) on student performance. Two independent groups of students are randomly assigned to each method. After the study period, a standardized test is given. The results are as follows:

Group Sample Size ($n$) Sample Mean ($\bar{x}$) Sample Standard Deviation ($s$)
Method A (Group 1) $n_1 = 20$ $\bar{x}_1 = 85$ $s_1 = 10$
Method B (Group 2) $n_2 = 22$ $\bar{x}_2 = 80$ $s_2 = 12$

Assume that the population variances are equal. At a 1% significance level ($\alpha = 0.01$), is there a significant difference in the mean test scores between the two methods?

Answer:

Given:

  • Group 1 (Method A): $n_1 = 20$, $\bar{x}_1 = 85$, $s_1 = 10$. Sample variance $s_1^2 = 10^2 = 100$.
  • Group 2 (Method B): $n_2 = 22$, $\bar{x}_2 = 80$, $s_2 = 12$. Sample variance $s_2^2 = 12^2 = 144$.
  • Significance level ($\alpha$): 0.01
  • Assumption: Equal population variances.

We want to determine if the mean test score for Method A is significantly different from the mean test score for Method B. This suggests a two-tailed test comparing two independent group means.

Hypotheses:

Let $\mu_1$ be the true mean test score for Method A and $\mu_2$ be the true mean test score for Method B.

$H_0: \mu_1 = \mu_2$

[There is no difference in true mean scores between the methods]

$H_1: \mu_1 \ne \mu_2$

[There is a significant difference in true mean scores]

Solution:

Since we assume equal population variances, we use the pooled two-independent-samples t-test. First, calculate the pooled sample variance ($s_p^2$).

$s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}$

[Formula (2)]

Substitute the values:

$s_p^2 = \frac{(20-1)(100) + (22-1)(144)}{20 + 22 - 2}$

$s_p^2 = \frac{19 \times 100 + 21 \times 144}{40}$

$s_p^2 = \frac{1900 + 3024}{40} = \frac{4924}{40}$

$s_p^2 = 123.1$

[Pooled variance] ... (1)

Now, calculate the t-statistic using the pooled variance formula:

$t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s_p^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}$

[Formula (4)]

Substitute the values:

$t = \frac{85 - 80}{\sqrt{123.1 \left(\frac{1}{20} + \frac{1}{22}\right)}}$

$t = \frac{5}{\sqrt{123.1 \left(\frac{22 + 20}{20 \times 22}\right)}} = \frac{5}{\sqrt{123.1 \left(\frac{42}{440}\right)}}$

[Combine fractions in denominator]

$t = \frac{5}{\sqrt{123.1 \times 0.095454...}}$

[Convert fraction to decimal or keep as fraction]

$t = \frac{5}{\sqrt{11.753...}}$

[Multiply inside square root]

$t \approx \frac{5}{3.428}$

[Take square root of denominator]

$t \approx 1.458$

[Calculate t-statistic] ... (2)

The degrees of freedom are $df = n_1 + n_2 - 2 = 20 + 22 - 2 = 40$.

For a two-tailed test at $\alpha = 0.01$ with $df = 40$, we look up the critical t-values from a t-distribution table. The critical values are approximately $\pm 2.704$.

Decision: The calculated t-statistic is $t_{calculated} = 1.458$. The critical values are $t_{critical} = \pm 2.704$. We compare the absolute value: $|1.458| = 1.458$. Since $1.458 \le 2.704$, the calculated t-statistic falls within the acceptance region. Therefore, we fail to reject the null hypothesis $H_0$.

Conclusion: At the 1% significance level, there is not enough statistical evidence to conclude that there is a significant difference in the true mean test scores between the two teaching methods. The observed difference in sample means (85 vs 80) could reasonably be due to random sampling variability.